首页> 外文OA文献 >Learning Joint Representations of Videos and Sentences with Web Image Search
【2h】

Learning Joint Representations of Videos and Sentences with Web Image Search

机译:用Web Image学习视频和句子的联合表示   搜索

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Our objective is video retrieval based on natural language queries. Inaddition, we consider the analogous problem of retrieving sentences orgenerating descriptions given an input video. Recent work has addressed theproblem by embedding visual and textual inputs into a common space wheresemantic similarities correlate to distances. We also adopt the embeddingapproach, and make the following contributions: First, we utilize web imagesearch in sentence embedding process to disambiguate fine-grained visualconcepts. Second, we propose embedding models for sentence, image, and videoinputs whose parameters are learned simultaneously. Finally, we show how theproposed model can be applied to description generation. Overall, we observe aclear improvement over the state-of-the-art methods in the video and sentenceretrieval tasks. In description generation, the performance level is comparableto the current state-of-the-art, although our embeddings were trained for theretrieval tasks.
机译:我们的目标是基于自然语言查询的视频检索。另外,我们考虑在给定输入视频的情况下检索句子或生成描述的类似问题。最近的工作通过将视觉和文本输入嵌入到语义相似性与距离相关的公共空间中来解决该问题。我们还采用了嵌入方法,并做出了以下贡献:首先,我们在句子嵌入过程中利用网络图像搜索来消除细粒度的视觉概念的歧义。其次,我们提出了针对句子,图像和视频输入的嵌入模型,这些模型的参数是同时学习的。最后,我们展示了如何将建议的模型应用于描述生成。总体而言,我们发现视频和句子检索任务的最新技术有了明显的改进。在描述生成中,尽管我们的嵌入内容是针对搜索任务进行训练的,但性能水平与当前的最新水平相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号